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THE EVALUATION OF AN EDUCATIONAL PROGRAM TYPICALLY 
IMPLIES MEASUREMENT. MEASUREMENT, IN TURN, IMPLIES TESTING IN 
ONE FORM OR ANOTHER. IN ORDER TO CARRY OUT THE TESTING 
NECESSARY FOR THE EVALUATION OF AN EDUCATIONAL PROGRAM, 
RESEARCHERS OFTEN DEVELOP A COMPLETE TESTING SUB-PROGRAM. THE 
EVALUATION OF THE TOTAL PROJECT MAY DEPEND UPON THE TESTING 
SUB-PROGRAM. IF THE TESTING PROGRAM IS SOMETHING LESS THAN 
ADEQUATE, THE EVALUATION OF THE TOTAL PROJECT MAY BE SUSPECT. 
RESEARCHERS SHOULD PAY AS MUCH ATTENTION TO THE EVALUATION OF 
A TESTING SUB-PROGRAM AS THEY BO TO THE EVALUATION OF THE 
TOTAL PROJECT. THE PROPOSED MODEL FOR EVALUATING A TESTING 
SUB-PROGRAM INCLUDES THE FOLLOWING STEPS WHICH WERE ADAPTED 
FROM A GENERAL EVALUATION MODEL BY C. M. LINDVALL-- (1) DEFINE 
THE UNIQUE OBJECTIVES OF THE TESTING PROGRAM, (2) DEFINE THE 
TESTING PROGRAM WITH REGARD TO PERSONNEL AND FACILITIES, 
PLANNED AND ACTUAL FUNCTIONS AND PRODUCTS, (3) PLAN AND CARRY 
OUT EVALUATION OF THE TESTING PROGRAM CONCURRENT AND 
CONSISTENT WITH THE TOTAL PROGRAM EVALUATION. THIS WOULD 
INCLUDE OBJECTIVELY ASSESSING THE ACHIEVEMENT OF THE TESTING 
PROGRAM OBJECTIVES AND OBSERVING ANY UNPLANNED RESULTS OF THE 
TESTING PROGRAM, AND (4) ATTACH A VALUATION TO THE TESTING 
PROGRAM TO ANSWER THE QUESTION, “CAN AN EVALUATION OF THE 
TOTAL PROJECT BASED UPON THIS TESTING PROGRAM BE CONSIDERED 
SOUND." THIS PAPER WAS PRESENTED AT THE AMERICAN EDUCATIONAL 
RESEARCH ASSOCIATION CONVENTION, CHICAGO, ILL., FEBRUARY 9, 
1968. (AUTHOR) 
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Well pla nn ed education and curriculum innovations include compre- 
hensive evaluation activities as an integral part of the project. 

Evaluation in such a program implies measurement of outcomes on a number 
of dependent variables; this measurement implies testing in one form 
or another. Typically the most relevant and meaningful measurement 
devices that can be used for this evaluation are tests designed 
specifically for the curriculum innovation being studied. In a large 
scale program, the rather formidable task of producing these tests may 
be accomplished by a staff of test construction specialists — a sub-group 
within the larger project. The testing program designed by this test 
construction group provides for the assessment of pupil performance within 
the educational innovation. Since pupil performance is usually a major 
criterion for evaluating the entire project, an accurate assessment of 
that performance is essential to the project evaluation. In other 
words, the evaluation of an entire innovation is often dependent upon 
measurements made by a testing sub-program. If these measurements are not 
meaningful or reliable, then the evaliiation may be subject to question. 

At this point a rationale for evaluating the testing sub— program becomes 
apparent. It is important for the researcher who is interested in the worth 
of the total project to first knov? the worth of his instruments. The 
evaluation of the testing program is a necessary pre-or co-requisite 
for a sotmd total project evaluation. 
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^aper presented at the annual meeting of the American Educational Research 
Association, Chicago, Illinois, February 9, 1968. 
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Procedures for studying the testing program may include many of the 
principles and practices of evaluating educational programs in general. 
Ideally, the testing program study should be concurrent with and equal 
in magnitude to the total project evaluation. Figure 1 outlines a model 
or generalized plan for accomplishing the evaluation of a testing program. 
(This model has been adapted from a general evaluation model proposed by 
C. H. Lindvall at the University of Pittsburgh.) The line of small 
boxes at the top of the page represents a total project evaluation. 

The testing program study that is the topic of this paper parallels the total 
project study. The large boxes show the four major ccnnponents or phases 
of the testing program evaluation. The arrov/s between the boxes are the 
connecting links; they represent questions the evaluator asks about the 
information he collects within the four components. Sane of these 
questions are elaborated below the model. They are arbitrary, and they 
represent the most subjective aspects of the evaluation, but the evaluator 
must attempt to answer them Impartially and support his answers with objective 
evidence. The various procedures in the model and some details will now 
be elaborated. 

The first step in this procedural model is to define the testing program 
objectives. These objectives must be expressed in quite unambiguous, 
operational terms so that their achievement can be assessed. They must 
be consistent with the total project goals, and, in addition, they should 
elaborate the unique functions of the testing program and define its role 
in the project. It cannot be emphasized enough that the objectives 
'must be operational. For example, it is not sufficient to say that an 
objective of the testing program is to write "good” achievement tests 
for the project. The type and content of these achievement tests must 
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be specified, end criteria for what thrill be called a ’'good" test must 
be defined in terms or validity, reliability, and item characteristics^ 

Essential to the evaluation of any program is a thorough description 
of the innovation to be studied. This is the second canponent of the 
model for the evaluation of a testing program. The evaluator should 
carefully study, observe, and define the written plan for, and the 
actual operation of the testing program. He should describe in detail 
its personnel, its facilities, and the instruments and measurements it 
produces, taking into account the relationship of the actual operation 
to the stated objectives. 

The third component of the model is what might be considered the 
heart of the evaluation — the actual assessment of the testing program’s 
outcomes. It has already been suggested that measuronent of the achieve- 
ment of the testing program objectives depends upon how the objectives 
themselves are stated. At this point a discussion of an existing testing 
program will help clarify this third component. 

One of the projects of the University of Pittsburgh’s Learning 
Research and Development Center is Individually Prescribed Instruction 
(IPI) . The IPI system includes a testing sub-program which provides 
diagnostic instruments necessary for measuring pupil achievement in 
reference to the IPI curriculum. In other words, it produces achieve- 
ment tests which assess a pupil’s mastery of specific skills in the 
curriculum. The first operational objective of the IPI testing program 
is to provide achievement tests which are specifically content-referenced 
to the behavioral objectives of the IPI curricula. To assess this goal, 
a check can be made as to whether such tests exist for each curriculum 



objective. 
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Another operational objective of the testing program is to place 
pupils in proper work levels to begin study in IPI at the beginning of 
each school year or when they first enter an IPI school. Assessing the 
achievement of this objective is a little more complicated. Not only is 
it necessary to check whether placement tests exist for all levels and units 
of work, but also it is essential to find out whether the tests place 
pupils in the proper work levels. An estimate of placonent test validity 
can help give an idea of how accurately it assesses pupil achievonent. 

A concurrent validity can be obtained by administering, to a selected 
sample of students, both the placement test and the set of pretests 
covering the same units of work. Then, results of the placement tests 
can be compared x^ith those of the pretests which supposedly measure 
the same skills in greater detail. Another way to find out whether 
pupils have been properly placed is to examine their work patterns during 
the first two months of school and identify cases of misplacement. If 
a pupil seems to have unusual difficulty with the work, or if he goes 
through it with extreme ease, it may be an indication that he has been 
misplaced. 

In general, a major portion of the assessment of any testing program 
consists of evaluating the instruments it produces. This means obtaining 
validity, reliability, and item analysis data for all such instruments 
and comparing this information with the standards established in the 
testing program objectives. For example, placement tests would be designed 
as general tests covering many skills and should, therefore, have lox-; 
internal consistency reliabilities and low inter-itan correlations. Tests 
of single skills, on the other hand, should be quite homogeneous and should 
have high internal consistency and high inter-itsm correlations. 
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If the testing program is large and produces many instruments 
it is almost a necessity to use computer facilities to collect and 
organize the data as well as to provide statistical analyses. If such 
facilities are available and are being used in the total ptoject evaluation, 
they should certainly be utilized in the testing program assessment; to 
use the computer for statistical analyses of test results without detailed 
analyses of test characteristics is rather meaningless. 

Along with the objective assessment of the tests, the evaluator 
should also be concerned with the more subjective observation, description, 
and evaluation of unexpected or unplanned outcomes of the testing program. 
For example, he must be alert to notice the effect of delays in getting 
needed materials, changes in the goals of the total project, changes in 
the curriculums or lack of communication between members of the testing 
staff and the rest of the project. All such observations should be 
recorded regularly and explicitly. 

The fourth phase of the model is the one in which the evaluator 
summarizes and interprets the information he has accumulated in the 
first three phases. He makes his interpretations in light of the 
objectives of the testing program and the goals of the total project. 

(Notice that the diagram of the model can be made into a cylinder so that 
arrow E becomes two directional^ connecting the interpretation phase, IV, 
with phase I, the objectives of the testing program.) In ccMpiling 
the results, the evaluator attempts to establish the worth of the testing 
program — to place a ’valuation on it. In conclusion, the rationale for 
evaluating the testing program can be expressed in one question — Do the 
measurements made by this testing program provide a sound basis for a 
total project- evaluation? The answer to this question epittmiizes the 
entire evaluation study. 
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A Model for the Evaluation of ^ Testing Program 
Nancy Jordan links and Richard C. Cox 
University of Pittsburgh 
Abstract 

The evaluation of an educational program typically implies measure- 
ment; measurement, in turn, implies testing in one form or another. 

In order to carry out the testing necessary for the evaluation of an 
educational program, researchers often develop a complete testing 
sub— program. The evaluation of the total project, therefore, may depend 
upon the testing sub-program. If the testing program is something less 
than adequate, the evaluation of the total project may be suspect. The 
point is that researcher should pay as much attention to the evaluation 
of a testing sub-program as they do to the evaluation of the total project 
of which it is an integral part. 

The proposed model for evaluating a testing sub-program includes the 
following steps: (1) Define the unique objectives of the testing program. 

These are generally subordinate to total project objectives and elaborate 
the functions of the testing sub-program. (2) Define the testing program 
with regard to personnel and facilities, planned and actual functions, 
and products. (3) Plan and carry out evaluation of the testing program 
concurrent and consistent with the total program evaluation. This would 
include objectively assessing the achievement of the testing program 
objectives and observing, perhaps subjectively, any unplanned results of 
the testing program. (4) Attach a ’valuation to the testing program to 
answer the question, "Can an evaluation of the total project based 
upon this testing program be considered sound?" 

* 

Adapted from a general evaluation model by C. M. Lindvall. 
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